NVIDIA Launches Granary Dataset to Enhance Multilingual Speech AI
NVIDIA has introduced the Granary dataset, a groundbreaking resource aimed at advancing multilingual speech recognition and translation across 25 European languages. The dataset, developed in collaboration with Carnegie Mellon University and Fondazione Bruno Kessler, addresses the scarcity of data for underrepresented languages like Croatian, Estonian, and Maltese.
Comprising nearly a million hours of audio—650,000 hours for speech recognition and 350,000 for translation—Granary is now available on Hugging Face. This open dataset, paired with NVIDIA's Canary and Parakeet models, empowers developers to build scalable AI applications, from multilingual chatbots to real-time translation services.
The initiative leverages NVIDIA's NeMo Speech Data Processor toolkit to transform raw audio into structured datasets, marking a significant leap in AI language model capabilities.